[POST GuideLLM Refactor] Multi-Turn Rework #374

sjmonson · 2025-09-26T21:01:44Z

Summary

Details

[ ]

Test Plan

Related Issues

Resolves #

"I certify that all code in this PR is my own, except as noted below."

Use of AI

Includes AI-assisted code completion
Includes code generated by an AI application
Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

Signed-off-by: guangli.bao <[email protected]>

Signed-off-by: psydok <[email protected]>

sjmonson · 2025-09-26T21:25:21Z

src/guidellm/scheduler/objects.py

+@SchedulerMessagingPydanticRegistry.register()
+class ScheduledRequestAugmentation(StandardBaseModel):
+    """
+    Adjustments to scheduler logic for a paired request.
+    """
+
+    post_requeue_delay: float = Field(
+        description=(
+            "Delay in seconds to wait after a request to "
+            "queue the next request in the conversation."
+        ),
+        default=0.0,
+    )
+
+


This could be a part of ScheduledRequestInfo but I figured it was not really necessary to pass it back and forth with each request update. Plus I thought this would be a good interface for adjusting the scheduling of individual requests.

sjmonson · 2025-09-26T21:30:11Z

src/guidellm/backends/openai.py

+    def _apply_history(
+        self,
+        request: GenerationRequest,
+        history: HistoryT[GenerationRequest, GenerationResponse],
+    ) -> GenerationRequest:
+        """
+        Apply conversation history to the current request.
+        """
+
+        def turn_to_text(turn: tuple[GenerationRequest, GenerationResponse]) -> str:
+            req, res = turn
+            return f"{req.content}{res.value}"
+
+        request.content = "".join(chain(map(turn_to_text, history), (request.content,)))
+        return request
+


Temporary hack until we land request templates.

## Summary  Final pieces needed for image CI work. Fully enables auto `latest`, `stable` tags and old image pruning. ## Details  - Add `pipefail` to list-tags command to catch failures - Add missing `ghcr.io/` to skopeo commands - Disable dry-run option for development image cleanup job ## Test Plan Ran with `workflow_dispatch` [see here](https://github.com/vllm-project/guidellm/actions/runs/18108553536) <img width="2032" height="955" alt="2025-09-29T15-45-39" src="https://github.com/user-attachments/assets/b981ab01-fe90-4e15-bf60-cb483508065e" /> <img width="1204" height="579" alt="2025-09-29T15-46-02" src="https://github.com/user-attachments/assets/68118168-2e80-4d45-92cc-47badc1caf16" /> --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) --------- Signed-off-by: Samuel Monson <[email protected]>

…icated combinations

## Summary It's inconvenient to look at metrics. ## Details - ## Test Plan - code launch ## Related Issues - Resolves ##371 --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)

## Summary  <img width="1757" height="1212" alt="image" src="https://github.com/user-attachments/assets/fbfddeac-ca56-40c0-b7ae-d2f17d50823a" /> ## Details  - [ ] ## Test Plan  - ## Related Issues  - Resolves # --- - [ ] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)

## Summary With the default path referring to the versioned build now, users will no longer experience their html reports breaking randomly when the build files are updated. Also fixed versioned build directory path issue that I missed previously --------- Signed-off-by: dalthecow <[email protected]>

## Summary We want to use ITL instead of TPOT. The data we had previously happened to be ITL data, but all of the labels indicate that it is TPOT data. Now the code and labels reflect that it is ITL data. ## Test Plan - Everything works, tests pass, No use of TPOT in the UI --------- Signed-off-by: dalthecow <[email protected]> Co-authored-by: Samuel Monson <[email protected]>

## TODO - Docs - ~CSV arg string support~ CSV arg string now supports single bucket (see last example). Might leave it at that for now. - More validation ## Summary  This PR is a port of #287 to the v0.4.0 refactor branch. Adds controls for sharing one or more fixed prefixes between samples. See examples bellow. ## Details  Adds a `prefix_buckets` argument to the `SyntheticTextDatasetConfig`, each bucket consists of a prefix count, token count, and bucket weight. Prefix count sets the number of unique prefixes to generate for a given bucket, token count is the length of each prompt in the bucket, and bucket weight is used to calculate the proportion of requests the bucket applies to relative to the sum of all bucket weights. Here are a few examples: Here we have one bucket of 32 prefixes of length 2048. Since there are 1024 total samples each prefix will apply to 32 samples. If there is only one bucket than weight can be omitted as the bucket applies to 100% of samples. ```yaml data: prefix_buckets: - prefix_tokens: 2048 prefix_count: 32 prompt_tokens: 256 output_tokens: 256 samples: 1024 ``` In this modified version of the first example 16 of the prompts have 2048 tokens while the other 16 have 1024 tokens. ```yaml data: prefix_buckets: - prefix_tokens: 2048 prefix_count: 16 bucket_weight: 50 - prefix_tokens: 1024 prefix_count: 16 bucket_weight: 50 prompt_tokens: 256 output_tokens: 256 samples: 1024 ``` The prefix tokens of a bucket can also be 0 to disable prefixes for those samples. Here is an example where 40% of the samples have a prefix of 2048 tokens while the other 60% have no prefix. ```yaml data: prefix_buckets: - prefix_tokens: 2048 bucket_weight: 40 - prefix_tokens: 0 bucket_weight: 60 prompt_tokens: 256 output_tokens: 256 samples: 1000 ``` If only a single bucket is needed, it can be set at the top level. This make the changes backwards compatible with the previous interface and allows the CSV string format to work without parsing nested structures (at least for this use-case). ```yaml data: prefix_tokens: 128 prefix_count: 10 prompt_tokens: 256 output_tokens: 256 samples: 1000 ``` ## Test Plan  - PR includes unit tests for all synthetic dataset changes (`pytest tests/unit/dataset`) - Scenearios in the Details section can be used against a model server with prefix caching and the cache rate can be confirmed by inspecting console output. ## Related Issues  - Resolves #232 - Closes #287 --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [x] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [x] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) --------- Signed-off-by: Samuel Monson <[email protected]>

## Summary  Fix to parsing rc ref in CI --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) Signed-off-by: Samuel Monson <[email protected]>

… and chat completions pathways

## Summary  This is the same fix as #389 but applied to the RC workflow rather than the release workflow as was the original intent with #389. Both workflows need this change so not reverting the other one. --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) Signed-off-by: Samuel Monson <[email protected]>

## Summary  ## Details  - [ ] ## Test Plan  - ## Related Issues  - Resolves # --- - [ ] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) Signed-off-by: Samuel Monson <[email protected]>

Many of the quality errors are due to using the older union style, and have appeared due to the upgrade of the minimum Python version from 3.9 to 3.10 Signed-off-by: Jared O'Connell <[email protected]>

Signed-off-by: Jared O'Connell <[email protected]>

## Summary  Makes the `max_tokens` request key configurable through an environment variable per endpoint type. Defaults to `max_tokens` for legacy `completions` and `max_completion_tokens` for `chat/completions` ## Details  - Add the `GUIDELLM__OPENAI__MAX_OUTPUT_KEY` config option which is a dict mapping from route name -> output tokens key. Default is `{"text_completions": "max_tokens", "chat_completions": "max_completion_tokens"}` ## Test Plan  - ## Related Issues  - Closes #395 - Closes #269 - Related #210 --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`) --------- Signed-off-by: Tyler Michael Smith <[email protected]> Signed-off-by: Samuel Monson <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]>

… package and CLI pathways (#414) ## Summary Changed the benchmarking entrypoint to take in an Args object which is now used to load scenarios. It enables a single source of truth in addition to being able to save the exact configurations in the report output. ## Details  - [ ] ## Test Plan  - ## Related Issues  - Resolves # --- - [ ] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)

Signed-off-by: Samuel Monson <[email protected]>

…odec (#411) ## TODO - [ ] ~~More flexible version locking in multimodal extras group~~ - Goal with this was to add locking for different torchcodec/torch versions but honestly its not worth the hassle - [x] Check for multi-modal libs being installed - [ ] More testing on `encode_audio` ## Summary  Replaces audio processing libraries with `torchcodec` which eliminates 19 dependencies and brings us inline with what HuggingFace `datasets` is doing. ## Details  - ## Test Plan  - Run against audio server with ```bash guidellm benchmark run \ --target http://localhost:8000 \ --profile "synchronous" \ --max-requests 20 \ --request-type "audio_transcriptions" \ --data "openslr/librispeech_asr" \ --data-args '{"name": "clean", "split": "test"}' ``` --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [x] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)

Signed-off-by: Samuel Monson <[email protected]>

Signed-off-by: Jared O'Connell <[email protected]>

## Summary  Adds a `tox` env for updating the lock file. Also allows args for mypy env. --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)

## Summary Various type fixes with the goal of not breaking anything. --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [x] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)

## Summary TODO ## Details TODO ## Test Plan TODO ## Related Issues TODO

Signed-off-by: Samuel Monson <[email protected]>

## Summary  Turns the `guidellm[multimodal]` extras group into `guidellm[audio]` and `guidellm[vision]`. --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)

Signed-off-by: Samuel Monson <[email protected]>

## Summary  Install all extras in the container and add `ffmpeg`. --- - [x] "I certify that all code in this PR is my own, except as noted below." ## Use of AI - [ ] Includes AI-assisted code completion - [ ] Includes code generated by an AI application - [ ] Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes `## WRITTEN BY AI ##`)

tukwila and others added 2 commits September 22, 2025 14:47

first draft

8159ca7

Signed-off-by: guangli.bao <[email protected]>

Add formatting to json file with metrics

0701389

Signed-off-by: psydok <[email protected]>

sjmonson commented Sep 26, 2025

View reviewed changes

sjmonson force-pushed the features/refactor/base-draft branch from 4c4ea5d to aa81de8 Compare September 30, 2025 15:19

sjmonson force-pushed the features/refactor/multiturn branch from 9ae0532 to cd43b2c Compare September 30, 2025 15:39

sjmonson marked this pull request as ready for review September 30, 2025 18:26

markurtz and others added 7 commits October 1, 2025 08:05

Initial state for datasets rework to enable multimodal and more compl…

730eeb1

…icated combinations

Merge branch 'main' into add_json_formatiing

c32896c

Merge branch 'main' into example_simulator

d1297fe

Merge branch 'main' into example_simulator

2c0d993

sjmonson changed the title ~~[GuideLLM Refactor] Multi-Turn Rework~~ [POST GuideLLM Refactor] Multi-Turn Rework Oct 2, 2025

sjmonson marked this pull request as draft October 2, 2025 20:41

DaltheCow and others added 12 commits October 3, 2025 10:35

Simplifications for new data pathways and reenablement of completions…

bbca65a

… and chat completions pathways

Fix audio pathways so requests work

616ef92

Fixed quality errors

87ba006

Many of the quality errors are due to using the older union style, and have appeared due to the upgrade of the minimum Python version from 3.9 to 3.10 Signed-off-by: Jared O'Connell <[email protected]>

Run auto-formatter

1bd8846

Signed-off-by: Jared O'Connell <[email protected]>

Fix remaining ruff errors

1e8974c

Signed-off-by: Jared O'Connell <[email protected]>

sjmonson force-pushed the features/refactor/base-draft branch from aa81de8 to 48d1b95 Compare October 10, 2025 16:39

markurtz and others added 27 commits October 16, 2025 15:42

Replace pydub, librosa, and soundfile with torchcodec

a401165

Signed-off-by: Samuel Monson <[email protected]>

Add all group for extras

23d65ed

Signed-off-by: Samuel Monson <[email protected]>

Fix lock

5a768f8

Signed-off-by: Samuel Monson <[email protected]>

Rewrite encode_audio to use torchcodec

ec7071b

Signed-off-by: Samuel Monson <[email protected]>

Dump raw bytes not tensor

aee230c

Signed-off-by: Samuel Monson <[email protected]>

Code pathway cleanup

c1340b4

Signed-off-by: Samuel Monson <[email protected]>

Defer multimodal imports

c8e9ff9

Signed-off-by: Samuel Monson <[email protected]>

Apply copliot fixes

6d036f8

Signed-off-by: Samuel Monson <[email protected]>

Bump torchcodec verison

cf5a2e3

Signed-off-by: Samuel Monson <[email protected]>

Add tox lockfile updater

ad192cb

Signed-off-by: Samuel Monson <[email protected]>

Allow arguments for tox type checks

b65c6ab

Signed-off-by: Samuel Monson <[email protected]>

Fix mock server type errors

5b38f40

Signed-off-by: Jared O'Connell <[email protected]>

More type fixes

cb36c6a

Signed-off-by: Jared O'Connell <[email protected]>

Address utility and presentation type errors

1ffe53a

Signed-off-by: Jared O'Connell <[email protected]>

Fix type errors in extras

48769c2

Signed-off-by: Jared O'Connell <[email protected]>

Merge branch 'main' into features/refactor/base

f9af34d

Full refactor of GuideLLM (#351)

e787cc1

## Summary TODO ## Details TODO ## Test Plan TODO ## Related Issues TODO

Split multimodal group into vision and audio

af6b6b8

Signed-off-by: Samuel Monson <[email protected]>

Ensure all optional dependicies are in container

e0fc2e5

Signed-off-by: Samuel Monson <[email protected]>

Add some nice utlities to the image

8bba4df

Signed-off-by: Samuel Monson <[email protected]>

Add ffmpeg for audio

58665b6

Signed-off-by: Samuel Monson <[email protected]>

sjmonson force-pushed the features/refactor/multiturn branch from cd43b2c to 9669983 Compare October 20, 2025 16:59

sjmonson added 2 commits October 20, 2025 15:23

Generate synthetic data as multi-turn

9e2b7de

Hack multiturn into dataset formatters

fad9e9c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[POST GuideLLM Refactor] Multi-Turn Rework #374

[POST GuideLLM Refactor] Multi-Turn Rework #374

Uh oh!

sjmonson commented Sep 26, 2025

Uh oh!

sjmonson Sep 26, 2025

Uh oh!

sjmonson Sep 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Uh oh!

[POST GuideLLM Refactor] Multi-Turn Rework #374

Are you sure you want to change the base?

[POST GuideLLM Refactor] Multi-Turn Rework #374

Uh oh!

Conversation

sjmonson commented Sep 26, 2025

Summary

Details

Test Plan

Related Issues

Use of AI

Uh oh!

sjmonson Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

sjmonson Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants